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JACKKNIFING  THE  KAPLAN-MEIER  SURVIVAL  ESTIMATOR 
FOR  CENSORED  DATA:  SIMULATION  RESULTS  AND  ASYMPTOTIC  ANALYSIS 

Donald  P.  Gaver 
Rupert  G.  Miller,  Jr. 

1 .  Introduction 

Censored  data  problems  arise  frequently  in  medical,  and 
also  in  engineering  system  reliability,  applications .  For  example, 
in  medical  survivorship  studies  some  subjects  may  be  lost  to 
follow-up,  or  available  data  may  be  analyzed  before  all  subjects 
have  expired.  In  the  equipment  reliability  context  observed  units 
may  still  be  in  operation,  perhaps  after  several  previous  failures, 
at  the  time  of  the  analysis.  Considerable  attention  has  been 
recently  devoted  to  developing  informative  statistical  methods 
for  handling  data  of  this  type  (see  Kalbfleisch  and  Prentice  (1980)). 

It  is  straightforward,  though  sometimes  computationally 
tedious,  to  deal  with  censoring  in  a  parametric  manner,  i.e.  by 
assuming  a  specific  form  for  the  lifetime  distribution  (exponen¬ 
tial,  Weibull,  lognormal,  or  whatever)  and  then  estimating  param¬ 
eters,  perhaps  by  maximum  likelihood.  The  approach  adopted  here 
is,  instead,  to  begin  with  the  Kaplan-Meier  (1958)  product-limit 
estimator  of  survival  probability.  This  estimator  is  the  non- 
parametric  maximum  likelihood  estimator  of  a  distribution  function 
from  a  sample  of  singly-censored  data.  Then,  since  the  jackknife 
technique  has  been  shown  to  be  widely  useful  for  obtaining  robust 
intervals,  cf .  Miller  (1974)  ,  it  is  applied  to  the  Kaplan-Meier 
estimate  in  order  to  obtain  approximate  confidence  intervals  for 


the  survival  probability.  It  is  reasonable  to  argue  that  if  the 
jackknife  is  to  be  valid  under  complex  censoring  it  must  perform 
correctly  in  this  simplest  of  all  situations,  and  if  it  does  work 
here  then  it  is  likely  to  also  work  in  more  complex  settings. 
Therefore,  in  a  sense  we  are  reporting  on  the  results  of  a  pilot 
study  of  an  attractive  procedure. 

In  this  paper  the  effect  of  jackknifing  the  Kaplan-Meier 
estimate  will  be  examined  both  by  Monte  Carlo  simulation  (sampling 
experiments)  and  by  asymptotic  analysis.  In  Section  4,  we  report 
on  the  results  of  some  extensive  Monte  Carlo  investigations,  com¬ 
paring  confidence  limits  for  survival  probability  obtained  via 
jackknife  with  those  from  other  techniques.  It  will  be  seen  that 
the  jackknife  seems  to  perform  well  for  moderate  sample  sizes,  even 
under  some  rather  unusual  conditions.  In  Section  5,  asymptotic 
results  are  reported  that  provide  theoretical  underpinnings  for 
the  jackknife  procedure,  at  least  for  large  sample  sizes.  Specifi¬ 
cally,  it  is  shown  that  the  jackknifed  estimate  is  approximately 
normal  with  the  asymptotically  correct  variance,  and  hence  produces 
correct  confidence  limits  for  the  Kaplan-Meier  estimate.  Taken  by 
itself,  this  result  may  not  be  terribly  important,  because  an 
expression  for  the  variance  of  the  estimator  is  known,  and  it  can 
be  estimated  by  substituting  estimates  of  any  unknown  functions 
into  the  expression.  However,  for  doubly  censored  data  (cf.  Turn- 
bull  (1974)),  and  for  data  with  censoring  and  truncation,  the  situ¬ 
ation  is  more  complex  (cf.  Turnbull  (1978)).  The  fact  that  the  jack¬ 
knife  works  in  the  singly  censored  case  makes  it  more  likely  that 
it  works  for  these  more  complex  censoring  patterns  and  for  others 


as  well. 


It  should  be  noted  that  the  bootstrap  procedure,  a 
re-sampling  approach  investigated  by  Efron  (1979)  and  (1981)  is 
also  applicable  to  complex  censoring  situations,  apparently 
giving  results  in  good  agreement  with  Greenwood's  formula  for 
a  particular  case  investigated. 
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2.  Formulation  of  the  Problem?  the  Kaplan-Meier  Estimate 


Suppose  x1,x2,...,xn  are  n  observed  survival  times, 
e.g.  of  medical  patients  or  of  equipments  subject  to  failure. 

Some  of  these  observations  are  of  complete  lifetimes  (failure 
tii..as)  but  others  are  not,  having  been  censored  by  the  time  of 
observation.  For  short  we  refer  to  complete  observations  as 
deaths,  and  censored  observations  as  losses.  Censoring  simply 
means  that  a  "complete  time"  is  not  observed,  although  a  "partial 
time,"  up  to  the  censoring,  is.  Censoring  complicates  the  prob¬ 
lem  of  estimating  the  theoretical  survival  probability  to  time 
x,  denoted  by  F®  (x)  =  1-F®(x). 

Kaplan  and  Meier  (1958)  furnish  a  maximum  likelihood 
estimate  of  F° (x)  from  among  the  class  of  admissable  distribu¬ 
tions.  This  product-limit  estimate  may  be  written  in  several 
equivalent  ways,  assuming  no  ties  among  the  observations: 


f“oo  - 


f  n-r . 

x.n<x'n'ri+1 

1 


(2.1, a) 


n 

=  n 

i=l 


6i(x) 


( 2 .  l,b) 


k  (x)  k  (x)  j-n.  6  .-v 

«  n  p.  e  n  -1 

i=i  1  i=i  l  ni  J 


(2.1 ,c) 


In  (2.1, a),  r^  is  the  rank  of  xA  among  the  ordered  observa¬ 
tions  x(1)  <X{2)  *•*  <x(n)'  and  6i  is  unifcy  if  xi  is  an 
observed  death,  being  zero  otherwise.  In  (2.1,b), 
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il  if  *(i)  <  x  an<*  i-s  a  time  of  death 
'  (uncensored) 

(2.2) 

0  otherwise  . 

In  (2.1,c)  n^(=n-(i-l))  represents  the  number  of  items  exposed 

(to  either  death  or  loss)  at  the  ith  ordered  time,  and  k(x)  is 
the  total  number  of  deaths  by  time  x. 

A  numerical  example  helps  to  explain  the  estimate.  Suppose 
the  data  points  are 


1  <  2*  <  4  <  5*  <  /*  <  8  <  10 


where  the  starred  measurements  are  losses,  and  the  rest  deaths. 

Let  us  estimate  the  survival  probability  to  or  beyond  x  =  6.  Then, 
since  n  =  7,  and  k(6)  -2 


F7(6) 


by  (2 . 1 ,b) 


-  (¥)  (W) 


7-2-l'i 


1 


by  (2 . 1 ,c)  . 


Note  that  by  definition  (2.2)  the  estimate  jumps  down 

following  data  values  that  are  deaths,  does  not  jump  at 

losses,  and  remains  cor.  rt  ant  between  down- jumps.  Technically, 

F®(x)  is  a  left-continuous  monotonically  non- increasing  step 

function;  this  makes  F°(x),  the  estimated  distribution  of  time 

n 

of  death,  left-continuous  as  well. 
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3.  Interval  Estimates  for  the  Kaplan-Meier  Estimate 

For  a  given  set  of  data  the  K.-M.  estimate  provides  a 
point  estimate  of  the  survival  probability.  It  is,  of  course, 
desirable  to  assess  the  stability  of  such  an  estimate  under  rea¬ 
sonable  assumptions  about  the  origin  of  the  data;  specifically  it 
is  useful  to  furnish  approximate  confidence  intervals  for  a  sur¬ 
vival  distribution  F® (x) .  The  jackknife  procedure,  see  Miller 
(1974)  and  Mosteller  and  Tukey  (1977) ,  is  one  way  of  producing 
such  limits.  In  this  section  we  describe  the  computation  of  jack¬ 
knife  limits,  and  compare  the  results  to  confidence  limits  obtained 
by  alternative  procedures.  Comparisons  are  made  by  simulation. 

3.1.  The  Jackknife  Procedure 

The  jackknife  procedure  is  well-described  in  Mosteller 
and  Tukey  (1977)  ,  where  it  is  pointed  out  that  a  preliminary 
transformation  to  approximately  symmetrize  the  sampling  distri¬ 
bution  of  the  estimator  is  beneficial;  see  also  Cressie  (1981). 

For  this  study  we  have  chosen  to  utilize  the  classical  "inverse 
sine"  transformation  that  tends  to  stabilize  the  variance  of — and 
also  approximately  symmetrize — binomial  count  data.  This  trans¬ 
formation  is  suggested  since  the  number  of  samples  surviving  a  fixed 
time  would  be  binomial  under  ideal  conditions  if  there  were  no 
censoring.  Initial  experiments  with  a  logistic  transformation 
proved  to  be  less  satisfactory,  as  was  a  simple  log  transformation; 
in  practice,  both  log  and  logistic  transformations  must  involve  a 
"start,"  see  Tukey  [1977],  which  influences  the  coverage.  A  natural 
choice  is  l/2n,  see  Cox  [1972],  but  systematic  confidence  interval 
undercoverage  results,  empirically  suggesting  a  larger  value.  Here 
is  our  procedure. 
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(a) 

Select  a 

value 

of 

x  at 

which  to  estimate  survival 

probability. 

(b) 

Compute 

F°(xl, 

e 

.g.  by 

(2.1) . 

(c) 

Compute 

An(x) 

= 

sin  ^ 

/f“(x). 

(d) 

Compute 

F°  .  . 

n-1, 3 

(x) 

,  the 

K.-M.  estimate  leaving  out 

jth  observation,  whether  it  be  an  observed  (recorded) 
death,  or  a  loss.  The  formula  actually  used  was 


F°  ,  ±(x) 
n— l ,  1 


.  6 . (x) 

V  (nriri,  3  S  r_SzJ_i 

A  1  "-1  J  jAl 


(x) 


(3.1) 


(e)  Compute  An-1  j  (x)  =  sin  1 J ^n-1, j  ^ 

(f)  Compute  the  jth  pseudovalue: 

Vj  =  nAn (x)  -  (n-1) An-1  j(x),  j  =  1,2, . .  .  ,n 

(g)  Find  the  mean  and  variance  of  the  pseudovalues: 

5  =  ;  ,£j  vj  '  SES  si  =  jij  .Ii(vj-v>2  - 


SSI  sv  =  /sv 


(h)  Compute  (approximate)  two-sided  (l-o) *100%  confidence 
limits  as  follows: 

s. 


L  =  v  -  t1_a/2 (n-1)  —  s  sin 


_v 

/n 


<x> 


s 


<  V  +  t 


l-o/2 


V  _ 

/n  ‘ 


(n-1) 


U 


(3.2) 


where  ^i_a/2^n“^  is  the  %-point  of  Student's  t;  then 
invert  to  obtain  (approximate)  two-sided  (1-a)  •  100% 
confidence  limits  for  survival  beyond  x: 

2-0  2 

sin  (l)  S  F  (x)  <  sm  (u)  .  (3.3; 

Theoretical  justification  of  such  a  procedure  for  large  n  is 
given  in  a  final  section  of  this  paper.  The  quality  of  the 
product  is  illustrated  by  simulation  examples  to  appear  subse¬ 
quently. 

3.2.  Alternatives  to  the  Jackknife:  "Greenwood's  formula" 

The  classical  estimate  of  the  variance  of  the  estimate 
F®(x)  is  given  by  "Greenwood's  formula,"  see  Kaplan  and  Meier 
(1958) ,  p.  477,  or  Thomas  and  Grunkemeier  (1975)  ,  p.  867.  Again 
when  no  ties  are  present  this  may  be  expressed  as 


Var 


F°00 


1  2  k^x) 


6. 

l 


i=0  ni(ni-6i> 


(3.4) 


It  is  interesting  and  reassuring  that  this  approximate  formula 
delivers  exactly  (n~  ■ — ) as  an  estimated  variance  when  all 
observed  events  are  deaths. 

It  follows  that  approximate  two-sided  (1-a)  •  100% 
confidence  limits  may  be  obtained  by  this  procedure: 

a)  Select  a  value  of  x  at  which  to  estimate  survival 


b) 


probability. 

Compute  F^(x),  the  point  estimate  of  survival 


probability. 


c)  Compute  s  t,  =  Var 

Vj 
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from  (3.4)  . 


d)  Compute  approximate  two-sided  (1-a)  •  100%  confidence 


limits : 


LG  ~  ^n(x)  '  Zl-a/2  ^  ' 


U_  =  F°  (x)  +  z.  — 

G  n v  l-a/2  ^ 


where  zi-a/2  (3-“a/2)  •  100  percent  point  of 

the  unit  Normal.  Then 


Lg  <  FJ(X)  <  UG 


(3.5) 


with  approximately  the  quoted  confidence. 


For  justification  of  the  above  procedure,  which  we  will 
call  the  procedure  following  Thomas  and  Grunkemeier  (19  75)  , 
when  n  is  large  refer  to  Breslow  and  Crowley  (1974).  Simulation 
results  appear  subsequently. 

3.3.  An  Approximate  Likelihood-Ratio  Interval  Estimate 

Thomas  and  Grunkemeier  (1975)  propose  use  of  a  likelihood- 
ratio  based  procedure  for  obtaining  approximate  (1-a) '100% 
confidence  limits.  In  outline,  the  procedure  approximately  maxi¬ 
mizes  the  likelihood  of  a  survival  function  under  a  constraint; 
this  will  be  called  the  Z=  procedure.  For  a  similar  development 
see  Madansky  (1965) .  Specifically,  one  maximizes  the  likelihood 
(5d)  of  Kaplan  and  Meier,  subject  to  the  constraint  that  survival 
to  time  x  equals  F°: 


max  L 
(PifX) 


klx)  k  (x) 

=  I  ( 6i  An  (l-pi)  +  (n^a^Jln  piJ  +  X{  £  Jlnp.-Jln  F") 
1  1=1  1 


n 

+  l  {<5.  dn  (1-p.)  +  (n .-<$.)  fcn  p .  } 
i=k  (x)  +1  1  1  1  1  1 


(3. 


giving  estimates 
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^  n .  +  X  -  6  j 

( X )  =  —  ,  i  =  1  ,k  (x)  ; 

l 


ni~  5i 
n . 

l 


i  =  k (x) +1 , .  .  .  ,n 


and 


(3.7) 


k  ( x)  n 

n  p. (X)  =  FU(x;X) 
i=l  1 


(3.8) 


from  the  constraint  condition.  Next  (numerically)  solve  the 
equation 


[F°(x)  -  F°(x;X)  ]/F°(x;X)/tf(X)  =  tz^ 


a/2 


(3.9) 


for  X  and  X  where  F®  is  the  product-limit  estimate  of 
Lf  u  n 

survival  beyond  x,  F°(x;X)  is  given  by  (3.8),  and  zj_a/ 2 
is  the  (l-a/2)  100th  percent  point  of  the  unit  normal  distribu¬ 

tion.  Then,  according  to  Thomas  and  Grunkemeier  (see  footnote, 
p.  867)  V ( X )  may  be  expressed  as  follows: 

k  (x) 


V(X)  =  [ (n+X) /n]  l  6i/(ni+X) (ni+X-6i) 
i=l 


(3.10) 


=  (l-F°(x;X)  ]/(F°(x;X)  n(x)]  for  F”(x;X)=l 


•  s0 


=0 


here  n(x)  is  the  number  of  individuals  exposed  at  x.  Finally, 
(approximate)  upper  and  lower  confidence  limits  for  ate 

obtained  by  substituting  X^  and  Xy  into  (3.8): 


k  (x) 
P  =  n 
L  i=l 


n. +  X  -6 . 
1  l  1 


n  .  +  X 
1  l 


and  P„  =  H 
u  i=l 


k  (x)fni+Xy-6i'j 


ni+  V 


(3.11) 
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The  principle  difficulty  with  application  of  this 
method  is  the  numerical  solution  of  (3.9)  for  the  roots  X 

u 

and  Ay.  A  Newton-Raphson  method  was  utilized  in  the  program 
developed  for  this  study.  It  was  only  feasible  to  make  exten¬ 
sive  trials  of  the  procedure  for  sample  size  n  =  25. 
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4.  Simulation  Results 


In  order  to  compare  the  performance  of  the  jackknife 
procedure  to  the  other  candidates  described  above,  namely 
and  Z2,  some  of  the  particular  cases  treated  by  Thomas  and 
Grunkemeier  (1975,  p.  168ff.)  were  simulated,  and  nominal  95% 
and  90%  confidence  limits  were  constructed.  We  summarize  the 
results  in  the  following  tables.  Note  that  assessments  are  made 
of  interval  performance  at  three  probability-of-survival  levels: 
0.75,  0.50,  0.25  for  each  combination  of  death  and  failure 
distributions . 

Examination  of  the  tabulations  of  confidence  limit  coverage 
and  also  the  average  and  standard  deviations  of  c.i.  widths  sug¬ 
gest  that  the  jackknife  confidence  intervals  perform  in  a  generally 
conservative  manner  as  compared  to  the  "Greenwood's  formula" 
results  (Z^ )  and  the  approximate  likelihood  ratio  method  (Z2). 
That  is,  JK  tends  to  over-cover,  while  Z^  consistently  under¬ 
covers;  has  some  tendency  to  under-cover  with  severe  losses 

(Case  1)  and  for  small  probabilities  of  survival  but  generally 
performs  well.  Of  the  three  estimating  procedures,  Z2  is  by 
far  the  most  difficult  and  expensive  to  carry  out.  The  computer 
time  involved  in  computing  Z2  for  n  =  50  prohibited  tabulation 
of  those  results  for  this  study.  Note  that  the  tendency  of  the 
jackknife  to  over-cover  is  reduced  as  the  probability  of  survival 
decreases.  Actually  abusrdly  low  values  occur  for  survival  proba¬ 
bilities  0.50  and  0.25  in  Case  1;  they  are  a  consequence  of 
the  severe  censoring  assumed.  In  general,  the  results  obtained 
indicate  that  the  jackknife  procedure  is  a  worthy  competitor  of 


Case  2:  X.  (death  times)  independent  unit  exponential;  Y.  (loss  times) 


Width 


)  independent  unit  exponential. 


Width 


Case  4:  Xi  (death  times)  independent  unit  exponential;  Yi  (loss  times) 
independent  uniform  (0,1);  Sample  size  n  =  50. 


Width  0.021  0.023  -  0.037  0.031 


Case  5:  (death  times)  independent  unit  exponential;  (loss  times) 

independent  uniform  (0/1.5);  Sample  size  n  =  25. 


Width  0.037  0.046  0.034  0.043  0.033 
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"Greenwood's  formula"  under  present  circumstances,  and  that  it 
performs  only  a  little  less  effectively  than  does  the  approxi¬ 
mate  likelihood-based  procedure  Z^.  The  presented  jackknife 
technique  tends  to  be  conservative. 

In  order  to  supplement  the  above  information,  a  number  of 
additional  simulations  were  made  to  investigate  the  effect  of 
departure  from  the  random  censoring  model.  Specifically,  the 
censoring  time,  Y^ ,  was  allowed  to  depend  probabilistically 
upon  the  time  of  death,  for  a  sequence  of  experiments.  A 

selection  of  the  results  obtained  are  shown  next. 

In  the  above  situations,  in  which  and  Y^  are  now 

contrived  to  be  positively  dependent,  once  again  the  jackknife 
tends  to  result  in  over-coverage — i.e.  is  conservative,  and  some¬ 
times  radically  so.  This  is  to  be  contrasted  with  Greenwood's 
formula  results,  Z^,  which  generally  under-cover.  Here  there 
is  some  indication  that  the  likelihood  ratio  procedure,  Z 
has  a  tendency  to  under-cover  when  the  survival  probability  is 
near  0.5.  Of  course,  all  results  are  for  rather  small  sample 
sizes,  and  refer  to  exponentially  distributed  deaths. 
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5 .  Summary  of  Theoretical  Developments 

In  this  section  a  probability  model  for  random  censoring 
is  introduced.  In  terms  of  this  model  it  will  be  shown  that  the 
jackknife  produces  asymptotically  correct  confidence  limits  for  the 
survival  probability  from  the  Kaplan-Meier  estimator.  A  priori 
one  could  not  be  certain  whether  to  systematically  delete  each 
observation  in  turn  when  applying  the  jackknife  or  whether  to 
delete  only  the  uncensored  ones.  Our  results  show  that  the  proper 
method  is  to  delete  each  observation,  censored  or  uncensored. 

5.1.  The  Model 

Let  X®,X2,...,X^  be  independent  random  variables  distrib¬ 
uted  according  to  cdf  (x)  ,  which  is  continuous  with  F^(0)  =0. 
In  medical  applications  X?  represents  the  survival  time  of  the 
ith  patient,  and  in  engineering  reliability  it  represents  the 
time  to  failure  of  the  ith  equipment  (or  the  ith  time  to  failure 
of  an  equipment,  when  appropriate) .The  problem  is  to  estimate  F° , 
but  unfortunately  the  X?  are  not  all  directly  observable. 

Let  Y1»Y2,...Yn  be  independent  random  variables,  identi¬ 
cally  distributed  according  to  cdf  G,  the  latter  being  continu¬ 
ous  with  G (0 )  =0.  The  observable  variables  are  then 


and 


Xi  =  min{Xi,Yi>  , 


6i  =  1{xi  5  V  ' 


(5.1) 


where  I {A}  is  the  indicator  function  for  event  A.  The  Y^ 
variables  represent  censoring  times,  and  are  assumed  to  be  independ 
ent  of  the  X?.  The  statistician  actually  observes  the  smaller 
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of  the  two  variables,  and  also  knows  whether  the  observation  is 
uncensored  (a  "death”)  or  censored  (a  "loss"). 

5.2.  Cumulative  Hazard  Function 

The  Kaplan-Meier  estimator  F®  is  closely  related  to  the 
sample  cumulative  hazard  function  (chf) .  The  latter  is  defined 
as 


where 

(1974) 


6^(x)  is  defined  in 
show  that 


-  I 

i=l 

(2.2) 


6j_(x) 

n-i+1 


(5.2) 


In  fact  Breslow  and  Crowley 


-*n[l-F°(x)  ]  =  a°(x)  +0  (1/n) 


(5.3) 


and  it  may  be  shown  that 

dF°(x) 
1-F° (x) 

0 


A°(x) 


(5.4) 


the  integral  of  the  hazard  function  X^ (u)  =  dF° (u) / [1-F° (u) ] ; 

both  (5.3)  and  (5.4)  justify  the  name  given  to  A®. 

It  is  convenient  to  show  that  the  jackknifed  estimator  of 

F°,  denoted  by  ( x)  ,  is  asymptotically  normal  by  starting 

with  A^.  If  one  shows  that  A®(x)  is  asymptotically  normally 

distributed  then  it  follows  that  F^(x)  is  also  normal,  as  is 

n 

true  of  other  sufficiently  smooth  functions  (e.g.  arc  sine)  of 
F^(x).  If,  in  addition,  it  is  shown  that  the  jackknife  variance 
is  consistent  then  the  jackknife  confidence  procedure  illustrated 
in  Section  4  is  justified  for  large  sample  sizes. 
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5.3.  Asymptotic  Normality 
0 


Let  A  , (x;i)  be  the  sample  chf  when  the  ith  ordered 
n-1  — 

observation  is  deleted  from  the  sample.  Then 

i-1  <5  ix)  n  6  (x) 

A  0  (X-i)  =  y  >3/ - :■  .  +  y  -ill - -  . 

"-1  jil  n':l  j-i+1  n'3  +  1 


(5.5) 


The  corresponding  pseudo-value  is 


A°(x;i)  =  nA°(x)  -  (n-l)A^  ,(x;i) 
n  n  n-1 


n6(i)(x)  i;1  (j~1)  6(j)  (x)  5  6(i)(x) 

n  -  i  +  1  "  (n-j)  (n-j  +  15  +  j=}+1  n  -  j  +  1 


(5.6) 


The  jackknifed  estimator  is  the  average  of  the  pseudo¬ 
values.  From  (5.6), 


A°(x) 


1  ?  *0 ,  .  . 
=  ^  1  A^(x;i) 


n  i=l  n 


(x) 


.  f  Y  (J-1)<!(1)(X)  ,1  ?  ?  8(i) 

i«l  n'i+1  n  i«2  j=l  'n'j>  (n-3+1)  n  ^  ,Ji+1  n-j+1 


■  »»)<*>  1  n  (3-l)S(1, 

An  n  (n-j)  (n-j  +  1)  +  n  n-j  +  1 


(5.7) 


(x) 


=  An(x)  +  6(n)  <x)  * 


Thus  the  jackknifed  estimator  and  the  original  estimator 
differ  by  an  asymptotically  negligible  term.  Now  it  has  been 
shown  that  A^(x)  is  asymptotically  normal  with  mean  A®(x)  and 
variance 
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1 

n 


x 


0 


dF 


(1-F) ( 1-F  ) 


(5.8) 


(cf.  Breslow  and  Crowley  (1974) ,  Theorem  4) ,  and  so  it  follows 
that  A^(x)  has  the  same  asymptotic  distribution. 

In  order  to  study  the  Kaplan-Meier  estimator,  expand  the 
logarithm: 


Jin  F°  (x)  = 
n 


-A°  (X)  +  i  J  Jlil 
n  2  .  t .  ,  _  .  . 


(5.9) 


i=l  (n-i+1) 


Now  jackknife,  and  observe  that  the  result  of  jackknifing  the 


second  and  higher  order  terms  in  (5.9)  lead  to  expressions  which 

are  op(l//n)  ,  and  so  the  jackknifed  version  of  Jin  F^(x)  has 

the  same  asymptotic  (normal)  distribution  as  -A^(x).  Since 

expfln  F^(x)]  =  F^(x),  and  the  exponential  function  is  smooth 

(possesses  a  power-series  expansion)  it  may  be  shown  that  the 

normality  of  the  jackknifed  version  of  in  F®(x)  implies  that 

of  the  jackknifed  F° (x) .  Furthermore,  the  asymptotic  normal 

n  n  o  o 

-0  (1-F  (x)  f  dF 

distribution  has  mean  F  (x)  and  variance - — 


n 


(1-F) (1-FU) 


5.4.  Consistency  of  the  Sample  Variance 

It  may  be  shown  that  the  sample  variance  of  pseudovalues 
converges  (a.s.)  to  the  correct  population  variance,  further 
justifying  the  use  of  the  jackknife  for  large  samples.  Wo  merely 
sketch  the  demonstration;  see  Miller  (1975)  for  details.  Begin 
again  by  considering  the  pseudovalues  obtained  by  jackknifing 
the  sample  cumulative  hazard  function.  From  (5.5)  the  jackknife 
variance  estimate  is  given  by 


25 


n  Var  [A°(x)]  =  £  {An_(x;i)  -  A®(x)}2  = 


n  fn5(i)(x)  i-1  (J-Dfi(1)(x)  "  6(j)(x) 

'■n-P  i£1\  n-i  +  1  >x  (n-j)  (n-j+1)  j  =  i+1  n-j  +  1 


H  6  ,  .  .  (x)  _  ,  )  n 

-  I  -^U'T  -  ^  (X)}2  (5.10) 

>,11-1  +  1  n  (n)  ’ 

3=1 

.  1  njni(i)(x)  i-1  (j-l)6(.)(x)  i»(j)(«)  n-1 

n-1  i£i\  n-i  +  1  (n-j  )  (n-j  +  1)  n^n  -  j  +  1  n  °(n)  ) 


_  ?  <6u)lx)  i;1  id)(x)  "(n)'-r 

(  11  i^1(n-i  +  l  >  (n-j)  (n-j+1)  n  i 


6 (n) (x)  l2 


Now  square  and  study  the  individual  terms.  In  particular  the 
first  sum  of  squares  is 


agreeing  with  the  correct  value  (5.8)  multiplied  by  n.  Conse¬ 
quently  the  remaining  terms  must  cancel  out  in  the  a.s.  limit 
in  order  that  the  jackknife  variance  function  properly.  The 
steps  are  omitted  here;  see  Miller  (1975)  for  details.  Finally, 
the  correctness  of  the  jackknife  variance  for  the  sample  chf  extends 
to  the  Kaplan-Meier  estimate  by  previous  arguments.  It  may  also 
be  shown  that  the  jackknife  works  properly  on  any  estimator  which 
is  a  smooth-enough  function  of  F^;  in  particular  the  arc-sine, 
log,  or  logistic  transformations  may  all  be  jackknifed,  which 
justifies  the  approach  taken  in  Sections  3  and  4. 
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